The Dependence of Effective Planning Horizon on Model Accuracy
نویسندگان
چکیده
For Markov decision processes with long horizons (i.e., discount factors close to one), it is common in practice to use reduced horizons during planning to speed computation. However, perhaps surprisingly, when the model available to the agent is estimated from data, as will be the case in most real-world problems, the policy found using a shorter planning horizon can actually be better than a policy learned with the true horizon. In this paper we provide a precise explanation for this phenomenon based on principles of learning theory. We show formally that the planning horizon is a complexity control parameter for the class of policies to be learned. In particular, it has an intuitive, monotonic relationship with a simple counting measure of complexity, and that a similar relationship can be observed empirically with a more general and data-dependent Rademacher complexity measure. Each complexity measure gives rise to a bound on the planning loss predicting that a planning horizon shorter than the true horizon can reduce overfitting and improve test performance, and we confirm these predictions empirically.
منابع مشابه
Adaptive aggregate production planning with fuzzy goal programming approach
Aggregate production planning (APP) determines the optimal production plan for the medium term planning horizon. The purpose of the APP is effective utilization of existing capacities through facing the fluctuations in demand. Recently, fuzzy approaches have been applied for APP focusing on vague nature of cost parameters. Considering the importance of coping with customer demand in different p...
متن کاملA multi-stage stochastic programming for condition-based maintenance with proportional hazards model
Condition-Based Maintenance (CBM) optimization using Proportional Hazards Model (PHM) is a kind of maintenance optimization problem in which inspections of a system relevant to its failure rate depending on the age and value of covariates are performed in time intervals. The general approach for constructing a CBM based on PHM for a system is to minimize a long run average cost per unit of time...
متن کاملA production-inventory model with permissible delay incorporating learning effect in random planning horizon using genetic algorithm
This paper presents a production-inventory model for deteriorating items with stock-dependent demand under inflation in a random planning horizon. The supplier offers the retailer fully permissible delay in payment. It is assumed that the time horizon of the business period is random in nature and follows exponential distribution with a known mean. Here learning effect is also introduced for th...
متن کاملA Flow shop Production Planning Problem with basic period policy and Sequence Dependent set up times
Many authors have examined lot sizing, scheduling and sequence of multi-product flow shops, but most of them have assumed that set up times are independent of sequence. Whereas dependence of set up times to sequence is more common in practice. Hence, in this paper, we examine the discussed problem with hypothesis of dependence of set up times to sequence and cyclic schedule policy in basic peri...
متن کاملAn Investigation into the Effects of Joint Planning on Complexity, Accuracy, and Fluency across Task Complexity
The current study aimed to examine the effects of strategic planning, online planning, strategic planning and online planning combined (joint planning), and no planning on the complexity, accuracy, and fluency of oral productions in two simple and complex narrative tasks. Eighty advanced EFL learners performed one simple narrative task and a complex narrative task with 20 minutes in between. Th...
متن کامل